Last Update: 2019-02-04 12:57:18
Before we start, let’s load a few libraries.
rm(list = ls())
set.seed(100)
options(warn = -1)
library(knitr)
library(ggplot2)
library(caret)
library(doParallel)
registerDoParallel(cores = (detectCores() - 1))
We register all but one core so we can have a lot of parallelsism when we start training our models.
Let’s read in our data.
data.2015 = read.csv("data/2015.csv")
data.2016 = read.csv("data/2016.csv")
data.2017 = read.csv("data/2017.csv")
data.2018 = read.csv("data/2018.csv")
Now, we will only deal with regular season events. So let’s remove the playoffs from our datasets.
get.regular.season = function(data) {
subset(data, isPlayoffGame == 0)
}
season.2015 = get.regular.season(data.2015)
season.2016 = get.regular.season(data.2016)
season.2017 = get.regular.season(data.2017)
season.2018 = get.regular.season(data.2018)
Now let’s remove extraneous columns. At the end, we will have the following columns (I’ve changed their names for ease):
| Old Column Name | New Column Name |
|---|---|
xCordAdjusted |
x |
yCordAdjusted |
y |
shotAngleAdjusted |
angle |
shotDistance |
dist |
goal |
goal |
get.helpful.data = function(data) {
data.frame(x = data$xCordAdjusted,
y = data$yCordAdjusted,
angle = data$shotAngleAdjusted,
dist = data$shotDistance,
team = data$teamCode,
goal = data$goal)
}
analysis.2015 = get.helpful.data(season.2015)
analysis.2016 = get.helpful.data(season.2016)
analysis.2017 = get.helpful.data(season.2017)
analysis.2018 = get.helpful.data(season.2018)
Sometimes, there is incomplete data. Let’s just keep all the complete cases and remove the incomplete ones.
analysis.2015 = analysis.2015[complete.cases(analysis.2015),]
analysis.2016 = analysis.2016[complete.cases(analysis.2016),]
analysis.2017 = analysis.2017[complete.cases(analysis.2017),]
analysis.all = rbind(analysis.2017, rbind(analysis.2016, analysis.2015))
analysis.all = analysis.all[complete.cases(analysis.all),]
analysis.2018 = analysis.2018[complete.cases(analysis.2018),]
We’ll need a function to get team data.
get.team.data = function(data, code) {
subset(data, team == code)
}
With our data, we can start creating models. We’ll be creating the following models:
control = trainControl(method = "repeatedcv", number = 5, repeats = 2)
model.nnet = train(goal ~ . -goal -team,
data = analysis.all,
method = "nnet",
trControl = control)
## # weights: 31
## initial value 79308.617571
## iter 10 value 21110.433003
## iter 20 value 19408.328843
## iter 30 value 18788.755236
## iter 40 value 18693.310231
## iter 50 value 18597.284689
## iter 60 value 18567.292177
## iter 70 value 18548.738997
## iter 80 value 18537.793165
## iter 90 value 18523.033168
## iter 100 value 18512.476830
## final value 18512.476830
## stopped after 100 iterations
model.knn = train(goal ~ . -goal -team,
data = analysis.all,
method = "knn",
trControl = control)
Our predictions will come from analysis.2018. Here’s what a little bit of that data looks like:
analysis.2018
Now, we can use the predict function to get our predictions.
nnet.prediction = predict(model.nnet, newdata = analysis.2018)
knn.prediction = predict(model.knn, newdata = analysis.2018)
nnet.prediction.data = data.frame(analysis.2018)
nnet.prediction.data$predict = nnet.prediction
knn.prediction.data = data.frame(analysis.2018)
knn.prediction.data$predict = knn.prediction
So, our Neural Network data looks like:
nnet.prediction.data
Our K-Nearest Neighbors data looks like:
knn.prediction.data
With our predictions, let’s view how they differ.
plot.nnet = ggplot(nnet.prediction.data) +
geom_hex(aes(x = dist, y = predict, alpha = ..count..),
fill = "orange",
color = "grey") +
labs(title = "Predicted Goal Probability from Neural Network Model",
x = "Distance from Net",
y = "Probability of Scoring") +
theme_minimal()
plot.knn = ggplot(knn.prediction.data) +
geom_hex(aes(x = dist, y = predict, alpha = ..count..),
fill = "orange",
color = "grey") +
labs(title = "Predicted Goal Probability from K-Nearest Neighbors Model",
x = "Distance from Net",
y = "Probability of Scoring") +
theme_minimal()
Here is our neural net model:
plot.nnet
Here is our knn model:
plot.knn
Let’s first get their data.
pit.nnet = get.team.data(nnet.prediction.data, "PIT")
pit.knn = get.team.data(knn.prediction.data, "PIT")
Now, let’s see how the Penguins fared in our models.
pit.plot.nnet = ggplot(pit.nnet) +
geom_hex(aes(x = dist, y = predict, alpha = ..count..),
fill = "#000000",
color = "#FCB514") +
labs(title = "Pittsburgh Predicted Goal Probability from NNet Model",
x = "Distance from Net",
y = "Probability of Scoring") +
theme_minimal()
pit.plot.knn = ggplot(pit.knn) +
geom_hex(aes(x = dist, y = predict, alpha = ..count..),
fill = "#000000",
color = "#FCB514") +
labs(title = "Pittsburgh Predicted Goal Probability from KNN Model",
x = "Distance from Net",
y = "Probability of Scoring") +
theme_minimal()
Here is the neural network plot:
pit.plot.nnet
Here is the K nearest neighbors plot:
pit.plot.knn
Let’s first get their data.
bos.nnet = get.team.data(nnet.prediction.data, "BOS")
bos.knn = get.team.data(knn.prediction.data, "BOS")
Now, let’s see how the Bruins fared in our models.
bos.plot.nnet = ggplot(bos.nnet) +
geom_hex(aes(x = dist, y = predict, alpha = ..count..),
fill = "#FFB81C",
color = "#000000") +
labs(title = "Boston Predicted Goal Probability from NNet Model",
x = "Distance from Net",
y = "Probability of Scoring") +
theme_minimal()
bos.plot.knn = ggplot(bos.knn) +
geom_hex(aes(x = dist, y = predict, alpha = ..count..),
fill = "#FFB81C",
color = "#000000") +
labs(title = "Boston Predicted Goal Probability from KNN Model",
x = "Distance from Net",
y = "Probability of Scoring") +
theme_minimal()
Here is the neural network plot:
bos.plot.nnet
Here is the K nearest neighbors plot:
bos.plot.knn
Let’s first get their data.
tbl.nnet = get.team.data(nnet.prediction.data, "T.B")
tbl.knn = get.team.data(knn.prediction.data, "T.B")
Now, let’s see how the Lightning fared in our models.
tbl.plot.nnet = ggplot(tbl.nnet) +
geom_hex(aes(x = dist, y = predict, alpha = ..count..),
fill = "#002868",
color = "#FFFFFF") +
labs(title = "Tampa Bay Predicted Goal Probability from NNet Model",
x = "Distance from Net",
y = "Probability of Scoring") +
theme_minimal()
tbl.plot.knn = ggplot(tbl.knn) +
geom_hex(aes(x = dist, y = predict, alpha = ..count..),
fill = "#002868",
color = "#FFFFFF") +
labs(title = "Tampa Bay Predicted Goal Probability from KNN Model",
x = "Distance from Net",
y = "Probability of Scoring") +
theme_minimal()
Here is the neural network plot:
tbl.plot.nnet
Here is the K nearest neighbors plot:
tbl.plot.knn
Let’s first get their data.
sjs.nnet = get.team.data(nnet.prediction.data, "S.J")
sjs.knn = get.team.data(knn.prediction.data, "S.J")
Now, let’s see how the Sharks fared in our models.
sjs.plot.nnet = ggplot(sjs.nnet) +
geom_hex(aes(x = dist, y = predict, alpha = ..count..),
fill = "#006D75",
color = "#EA7200") +
labs(title = "San Jose Predicted Goal Probability from NNet Model",
x = "Distance from Net",
y = "Probability of Scoring") +
theme_minimal()
sjs.plot.knn = ggplot(sjs.knn) +
geom_hex(aes(x = dist, y = predict, alpha = ..count..),
fill = "#006D75",
color = "#EA7200") +
labs(title = "San Jose Predicted Goal Probability from KNN Model",
x = "Distance from Net",
y = "Probability of Scoring") +
theme_minimal()
Here is the neural network plot:
sjs.plot.nnet
Here is the K nearest neighbors plot:
sjs.plot.knn
Let’s first get their data.
nsh.nnet = get.team.data(nnet.prediction.data, "NSH")
nsh.knn = get.team.data(knn.prediction.data, "NSH")
Now, let’s see how the Predators fared in our models.
nsh.plot.nnet = ggplot(nsh.nnet) +
geom_hex(aes(x = dist, y = predict, alpha = ..count..),
fill = "#FFB81C",
color = "#041E42") +
labs(title = "Nashville Predicted Goal Probability from NNet Model",
x = "Distance from Net",
y = "Probability of Scoring") +
theme_minimal()
nsh.plot.knn = ggplot(nsh.knn) +
geom_hex(aes(x = dist, y = predict, alpha = ..count..),
fill = "#FFB81C",
color = "#041E42") +
labs(title = "Nashville Predicted Goal Probability from KNN Model",
x = "Distance from Net",
y = "Probability of Scoring") +
theme_minimal()
Here is the neural network plot:
nsh.plot.nnet
Here is the K nearest neighbors plot:
nsh.plot.knn
Let’s first get their data.
lak.nnet = get.team.data(nnet.prediction.data, "L.A")
lak.knn = get.team.data(knn.prediction.data, "L.A")
Now, let’s see how the Kings fared in our models.
lak.plot.nnet = ggplot(lak.nnet) +
geom_hex(aes(x = dist, y = predict, alpha = ..count..),
fill = "#111111",
color = "#A2AAAD") +
labs(title = "Los Angeles Predicted Goal Probability from NNet Model",
x = "Distance from Net",
y = "Probability of Scoring") +
theme_minimal()
lak.plot.knn = ggplot(lak.knn) +
geom_hex(aes(x = dist, y = predict, alpha = ..count..),
fill = "#111111",
color = "#A2AAAD") +
labs(title = "Los Angeles Predicted Goal Probability from KNN Model",
x = "Distance from Net",
y = "Probability of Scoring") +
theme_minimal()
Here is the neural network plot:
lak.plot.nnet
Here is the K nearest neighbors plot:
lak.plot.knn